도서관에서 책이 도착일 순서로 정리되지 않고, 하나의 유니버설 키으로 정리된다고 상상해 보세요. 이것은 순차적 접근에서 연관 컨테이너로의 패러다임 전환입니다. 선형적으로 벡터 를 스캔하는 대신, 우리는 맵 또는 셋 을 사용하여 로그 시간 복잡도($O(\log n)$)의 조회를 달성합니다.
1. 연관의 본질
맵에서는 맵키-값 쌍을 저장합니다. 키는 문자열, 사용자 정의 객체 또는 키-값 쌍Strict Weak Ordering을 지원하는 어떤 타입이든 될 수 있는 인덱스 역할을 합니다. 반대로, 엄격한 약한 순서셋은 유일한 키만 저장하므로, 멤버십 확인이나 필터링에 이상적인 도구입니다. 셋반대로, 유일한 키만 저장하여 멤버십 테스트나 필터링에 최적의 도구가 됩니다.
2. 정렬 여부에 따른 차이
표준 컨테이너(맵, 셋)는 키를 정렬된 상태로 유지합니다. C++11 표준 은 무작위 변형(unordered_map)을 도입했는데, 이는 평균 $O(1)$ 성능을 위해 해시 함수 를 사용합니다. 이러한 고성능 버킷을 사용할 때는 C++11 로고 를 찾아보세요.
main.py
TERMINALbash — 80x24
> Ready. Click "Run" to execute.
>
QUESTION 1
What happens when you use the subscript operator
m[k] on a map if the key k does not exist?It throws an out_of_range exception.
It returns a null pointer.
It adds a new element with key k and a value-initialized value.
The program fails to compile.
✅ Correct!
Subscripting a map behaves differently from a vector; it automatically inserts missing keys.❌ Incorrect
Unlike at(), the [] operator is an insertion-on-demand mechanism.QUESTION 2
Which of the following calls is ILLEGAL for a multiset
c and a vector v?copy(v.begin(), v.end(), inserter(c, c.end()));copy(v.begin(), v.end(), back_inserter(c));copy(c.begin(), c.end(), back_inserter(v));copy(c.begin(), c.end(), inserter(v, v.end()));✅ Correct!
Correct. back_inserter requires push_back, which associative containers like multiset do not support.❌ Incorrect
Associative containers lack push_back, so back_inserter cannot be used with them.QUESTION 3
Identify the correct way to define a variable to hold the result of calling
find on a map<string, vector> >.vector<int> res = m.find("key");map<string, vector>::iterator it = m.find("key"); >pair<string, vector> p = m.find("key"); >bool found = m.find("key");✅ Correct!
find returns an iterator to the element if found, or m.end() otherwise.❌ Incorrect
The find method always returns an iterator, not the value or a boolean.QUESTION 4
What is the primary advantage of using a
set over a vector for storing a list of excluded words?A set uses less memory than a vector.
A set allows for duplicate excluded words.
A set provides O(log N) lookup instead of O(N) linear search.
A set allows elements to be accessed by index.
✅ Correct!
Using exclude.find(word) on a set is significantly more efficient than std::find on a vector for large datasets.❌ Incorrect
While a vector is simpler, its linear search ($O(N)$) becomes a bottleneck compared to a set's $O(\log N)$.QUESTION 5
In the expression
++word_count.insert({word, 0}).first->second;, what does first refer to?The key of the inserted element.
The boolean indicating if insertion was successful.
An iterator to the element with the given key.
The first element in the entire map.
✅ Correct!
map::insert returns a pair where first is an iterator to the element.❌ Incorrect
The return of insert is a pair<iterator, bool>. first is the iterator.Module Assessment: High-Performance Text Processing
Implementing the Text-Query Architecture
You are tasked with building a search engine component (Exercise 12.28). The system must read a file and allow users to query words, returning the line numbers where those words appear. You must decide on the most efficient container strategy without using custom classes initially.
Q
1. Design the data structure to hold the word-to-line-number mapping. Which combination is most efficient?
Solution:
A
A
map<string, set> > is the ideal choice. The map allows $O(\log N)$ lookup for the word (the key), and the set stores line numbers (values) uniquely and in ascending order, facilitating clean result output.Q
2. Write a code snippet to implement the word counting logic that ignores case and punctuation (Required: 100 words).
Solution:
To implement case-insensitive word counting:
To implement case-insensitive word counting:
for (char &c : word) c = tolower(c);
word.erase(remove_if(word.begin(), word.end(), ispunct), word.end());
++word_count[word];
This logic iterates through each character of the string, converting it to lowercase using tolower. Then, it utilizes the remove_if algorithm with ispunct to shift all punctuation marks to the end of the string, followed by erase to truncate them. Finally, it uses the map's subscript operator to increment the count. This ensures that 'Example!', 'example.', and 'Example' are all normalized to 'example' and counted under a single unique key, maintaining data integrity.Q
3. How would you modify the system to safely share this data with a 'QueryResult' object without duplicating the entire map?
Solution:
Use
Use
shared_ptr. Specifically, the QueryResult should hold a shared_ptr<set> >. By using QueryResult(sought, loc->second, file) where the second argument is a shared pointer to the line number set, the data remains valid even if the original TextQuery object's scope ends, while avoiding expensive deep copies.